Overview

Dataset statistics

Number of variables27
Number of observations9006
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.9 MiB
Average record size in memory216.0 B

Variable types

CAT18
NUM7
DATE1
BOOL1

Warnings

crash_time has a high cardinality: 1259 distinct values High cardinality
on_street_name has a high cardinality: 2525 distinct values High cardinality
off_street_name has a high cardinality: 1694 distinct values High cardinality
number_of_cyclist_killed is highly correlated with number_of_persons_killedHigh correlation
number_of_persons_killed is highly correlated with number_of_cyclist_killedHigh correlation
df_index has unique values Unique
number_of_persons_injured has 6000 (66.6%) zeros Zeros
number_of_pedestrians_injured has 8342 (92.6%) zeros Zeros
number_of_motorist_injured has 7182 (79.7%) zeros Zeros

Reproduction

Analysis started2020-12-11 10:22:58.586598
Analysis finished2020-12-11 10:24:10.568252
Duration1 minute and 11.98 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct9006
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5131.27515
Minimum2
Maximum9999
Zeros0
Zeros (%)0.0%
Memory size70.4 KiB
2020-12-11T11:24:11.176972image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile763.25
Q12694.25
median5134.5
Q37571.75
95-th percentile9519.75
Maximum9999
Range9997
Interquartile range (IQR)4877.5

Descriptive statistics

Standard deviation2813.937984
Coefficient of variation (CV)0.5483896111
Kurtosis-1.19457331
Mean5131.27515
Median Absolute Deviation (MAD)2439
Skewness-8.643233311e-05
Sum46212264
Variance7918246.977
MonotocityStrictly increasing
2020-12-11T11:24:12.127687image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
20471< 0.1%
 
33871< 0.1%
 
95421< 0.1%
 
33951< 0.1%
 
13461< 0.1%
 
74891< 0.1%
 
54401< 0.1%
 
95341< 0.1%
 
13381< 0.1%
 
74971< 0.1%
 
Other values (8996)899699.9%
 
ValueCountFrequency (%) 
21< 0.1%
 
51< 0.1%
 
91< 0.1%
 
101< 0.1%
 
121< 0.1%
 
ValueCountFrequency (%) 
99991< 0.1%
 
99981< 0.1%
 
99971< 0.1%
 
99961< 0.1%
 
99951< 0.1%
 
Distinct116
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size70.4 KiB
Minimum2017-01-17 00:00:00
Maximum2020-12-04 00:00:00
2020-12-11T11:24:12.747647image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:24:13.624172image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

crash_time
Categorical

HIGH CARDINALITY

Distinct1259
Distinct (%)14.0%
Missing0
Missing (%)0.0%
Memory size70.4 KiB
00:00:00
 
147
15:00:00
 
97
13:00:00
 
96
17:00:00
 
95
19:00:00
 
93
Other values (1254)
8478 
ValueCountFrequency (%) 
00:00:001471.6%
 
15:00:00971.1%
 
13:00:00961.1%
 
17:00:00951.1%
 
19:00:00931.0%
 
14:00:00931.0%
 
12:00:00921.0%
 
18:00:00911.0%
 
16:00:00891.0%
 
14:30:00850.9%
 
Other values (1249)802889.1%
 
2020-12-11T11:24:14.531079image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique259 ?
Unique (%)2.9%
2020-12-11T11:24:15.213061image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length8
Median length8
Mean length8
Min length8

borough
Categorical

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size70.4 KiB
Unknown
2974 
Brooklyn
2079 
Queens
1623 
Bronx
1245 
Manhattan
872 
ValueCountFrequency (%) 
Unknown297433.0%
 
Brooklyn207923.1%
 
Queens162318.0%
 
Bronx124513.8%
 
Manhattan8729.7%
 
Staten Island2132.4%
 
2020-12-11T11:24:15.644157image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-11T11:24:15.998833image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:24:16.517140image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length13
Median length7
Mean length7.109704641
Min length5

zip_code
Real number (ℝ)

Distinct182
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7300.198867
Minimum-1
Maximum11697
Zeros0
Zeros (%)0.0%
Memory size70.4 KiB
2020-12-11T11:24:17.150277image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile-1
Q1-1
median10458
Q311223
95-th percentile11422
Maximum11697
Range11698
Interquartile range (IQR)11224

Descriptive statistics

Standard deviation5144.080693
Coefficient of variation (CV)0.7046493919
Kurtosis-1.48043454
Mean7300.198867
Median Absolute Deviation (MAD)779
Skewness-0.7013272898
Sum65745591
Variance26461566.17
MonotocityNot monotonic
2020-12-11T11:24:18.055536image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
-1297433.0%
 
112071521.7%
 
112361211.3%
 
112081151.3%
 
112121081.2%
 
10467951.1%
 
11226921.0%
 
11203891.0%
 
11385881.0%
 
10457840.9%
 
Other values (172)508856.5%
 
ValueCountFrequency (%) 
-1297433.0%
 
100004< 0.1%
 
10001280.3%
 
10002440.5%
 
10003270.3%
 
ValueCountFrequency (%) 
116972< 0.1%
 
116944< 0.1%
 
11693120.1%
 
1169280.1%
 
11691480.5%
 

latitude
Real number (ℝ≥0)

Distinct7248
Distinct (%)80.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.72793471
Minimum40.507267
Maximum40.9109
Zeros0
Zeros (%)0.0%
Memory size70.4 KiB
2020-12-11T11:24:19.262136image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum40.507267
5-th percentile40.599623
Q140.6659715
median40.7168915
Q340.8020705
95-th percentile40.8657445
Maximum40.9109
Range0.403633
Interquartile range (IQR)0.136099

Descriptive statistics

Standard deviation0.08376902426
Coefficient of variation (CV)0.00205679529
Kurtosis-0.8907520713
Mean40.72793471
Median Absolute Deviation (MAD)0.0599115
Skewness0.1615208588
Sum366795.78
Variance0.007017249426
MonotocityNot monotonic
2020-12-11T11:24:19.920395image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
40.861862130.1%
 
40.8451990.1%
 
40.8451890.1%
 
40.67573580.1%
 
40.7663580.1%
 
40.82030580.1%
 
40.6561670.1%
 
40.65197460.1%
 
40.73353660.1%
 
40.6649660.1%
 
Other values (7238)892699.1%
 
ValueCountFrequency (%) 
40.5072671< 0.1%
 
40.5117341< 0.1%
 
40.5166241< 0.1%
 
40.5191271< 0.1%
 
40.5197221< 0.1%
 
ValueCountFrequency (%) 
40.91091< 0.1%
 
40.910761< 0.1%
 
40.910381< 0.1%
 
40.910321< 0.1%
 
40.9096071< 0.1%
 

longitude
Real number (ℝ)

Distinct6864
Distinct (%)76.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-73.91336268
Minimum-74.23878
Maximum-73.70174
Zeros0
Zeros (%)0.0%
Memory size70.4 KiB
2020-12-11T11:24:20.768151image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-74.23878
5-th percentile-74.02221875
Q1-73.9586965
median-73.91749
Q3-73.8680625
95-th percentile-73.7626825
Maximum-73.70174
Range0.53704
Interquartile range (IQR)0.090634

Descriptive statistics

Standard deviation0.08299945637
Coefficient of variation (CV)-0.001122928972
Kurtosis1.211668832
Mean-73.91336268
Median Absolute Deviation (MAD)0.045295
Skewness-0.3017397742
Sum-665663.7443
Variance0.006888909757
MonotocityNot monotonic
2020-12-11T11:24:21.363133image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
-73.91282140.2%
 
-73.9112110.1%
 
-73.8968690.1%
 
-73.9141790.1%
 
-73.8908380.1%
 
-73.7673670.1%
 
-73.91910670.1%
 
-73.89773670.1%
 
-73.9419470.1%
 
-73.8654260.1%
 
Other values (6854)892199.1%
 
ValueCountFrequency (%) 
-74.238781< 0.1%
 
-74.235911< 0.1%
 
-74.2351151< 0.1%
 
-74.234861< 0.1%
 
-74.2304461< 0.1%
 
ValueCountFrequency (%) 
-73.701741< 0.1%
 
-73.702121< 0.1%
 
-73.702591< 0.1%
 
-73.703621< 0.1%
 
-73.706311< 0.1%
 

on_street_name
Categorical

HIGH CARDINALITY

Distinct2525
Distinct (%)28.0%
Missing0
Missing (%)0.0%
Memory size70.4 KiB
Belt Parkway
 
168
Broadway
 
99
Brooklyn Queens Expressway
 
90
Long Island Expressway
 
83
Cross Bronx Expy
 
82
Other values (2520)
8484 
ValueCountFrequency (%) 
Belt Parkway1681.9%
 
Broadway991.1%
 
Brooklyn Queens Expressway901.0%
 
Long Island Expressway830.9%
 
Cross Bronx Expy820.9%
 
Atlantic Avenue820.9%
 
Major Deegan Expressway790.9%
 
Fdr Drive770.9%
 
Grand Central Pkwy770.9%
 
3 Avenue700.8%
 
Other values (2515)809989.9%
 
2020-12-11T11:24:22.594952image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique1329 ?
Unique (%)14.8%
2020-12-11T11:24:23.392812image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length32
Median length14
Mean length14.46824339
Min length6

off_street_name
Categorical

HIGH CARDINALITY

Distinct1694
Distinct (%)18.8%
Missing0
Missing (%)0.0%
Memory size70.4 KiB
Unknown
4869 
3 Avenue
 
39
Broadway
 
38
2 Avenue
 
34
4 Avenue
 
24
Other values (1689)
4002 
ValueCountFrequency (%) 
Unknown486954.1%
 
3 Avenue390.4%
 
Broadway380.4%
 
2 Avenue340.4%
 
4 Avenue240.3%
 
5 Avenue240.3%
 
Queens Boulevard200.2%
 
Atlantic Avenue200.2%
 
Park Avenue190.2%
 
Linden Boulevard170.2%
 
Other values (1684)390243.3%
 
2020-12-11T11:24:24.496867image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique884 ?
Unique (%)9.8%
2020-12-11T11:24:26.819263image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length29
Median length7
Mean length9.774483678
Min length6

number_of_persons_injured
Real number (ℝ≥0)

ZEROS

Distinct10
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4384854541
Minimum0
Maximum10
Zeros6000
Zeros (%)66.6%
Memory size70.4 KiB
2020-12-11T11:24:27.911292image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile2
Maximum10
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.7623439804
Coefficient of variation (CV)1.738584423
Kurtosis13.39461939
Mean0.4384854541
Median Absolute Deviation (MAD)0
Skewness2.781649591
Sum3949
Variance0.5811683444
MonotocityNot monotonic
2020-12-11T11:24:28.914602image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0600066.6%
 
1240126.7%
 
24004.4%
 
31241.4%
 
4530.6%
 
5150.2%
 
660.1%
 
750.1%
 
101< 0.1%
 
81< 0.1%
 
ValueCountFrequency (%) 
0600066.6%
 
1240126.7%
 
24004.4%
 
31241.4%
 
4530.6%
 
ValueCountFrequency (%) 
101< 0.1%
 
81< 0.1%
 
750.1%
 
660.1%
 
5150.2%
 

number_of_persons_killed
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size70.4 KiB
0
9002 
3
 
4
ValueCountFrequency (%) 
09002> 99.9%
 
34< 0.1%
 
2020-12-11T11:24:31.339179image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-11T11:24:32.142416image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:24:32.804928image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

number_of_pedestrians_injured
Real number (ℝ≥0)

ZEROS

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.07583833
Minimum0
Maximum4
Zeros8342
Zeros (%)92.6%
Memory size70.4 KiB
2020-12-11T11:24:34.764507image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum4
Range4
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2742315335
Coefficient of variation (CV)3.616001744
Kurtosis16.15873294
Mean0.07583833
Median Absolute Deviation (MAD)0
Skewness3.732437796
Sum683
Variance0.07520293399
MonotocityNot monotonic
2020-12-11T11:24:35.876883image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=5)
ValueCountFrequency (%) 
0834292.6%
 
16487.2%
 
2140.2%
 
41< 0.1%
 
31< 0.1%
 
ValueCountFrequency (%) 
0834292.6%
 
16487.2%
 
2140.2%
 
31< 0.1%
 
41< 0.1%
 
ValueCountFrequency (%) 
41< 0.1%
 
31< 0.1%
 
2140.2%
 
16487.2%
 
0834292.6%
 
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size70.4 KiB
0
8996 
1
 
9
2
 
1
ValueCountFrequency (%) 
0899699.9%
 
190.1%
 
21< 0.1%
 
2020-12-11T11:24:37.170661image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique1 ?
Unique (%)< 0.1%
2020-12-11T11:24:38.074870image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:24:38.692720image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size70.4 KiB
0
8481 
1
 
511
2
 
14
ValueCountFrequency (%) 
0848194.2%
 
15115.7%
 
2140.2%
 
2020-12-11T11:24:39.722850image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-11T11:24:40.844374image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:24:41.558556image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

number_of_cyclist_killed
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size70.4 KiB
0
9002 
1
 
4
ValueCountFrequency (%) 
09002> 99.9%
 
14< 0.1%
 
2020-12-11T11:24:41.987885image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

number_of_motorist_injured
Real number (ℝ≥0)

ZEROS

Distinct10
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3027981346
Minimum0
Maximum10
Zeros7182
Zeros (%)79.7%
Memory size70.4 KiB
2020-12-11T11:24:42.300328image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum10
Range10
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.7318615064
Coefficient of variation (CV)2.416994766
Kurtosis18.49006004
Mean0.3027981346
Median Absolute Deviation (MAD)0
Skewness3.555934948
Sum2727
Variance0.5356212645
MonotocityNot monotonic
2020-12-11T11:24:42.763290image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0718279.7%
 
1125413.9%
 
23684.1%
 
31231.4%
 
4510.6%
 
5150.2%
 
660.1%
 
750.1%
 
101< 0.1%
 
81< 0.1%
 
ValueCountFrequency (%) 
0718279.7%
 
1125413.9%
 
23684.1%
 
31231.4%
 
4510.6%
 
ValueCountFrequency (%) 
101< 0.1%
 
81< 0.1%
 
750.1%
 
660.1%
 
5150.2%
 
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size70.4 KiB
0
8995 
1
 
9
2
 
2
ValueCountFrequency (%) 
0899599.9%
 
190.1%
 
22< 0.1%
 
2020-12-11T11:24:43.277139image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-11T11:24:43.582834image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:24:43.903211image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1
Distinct50
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size70.4 KiB
Unspecified
2504 
Driver Inattention/Distraction
2267 
Failure To Yield Right-Of-Way
597 
Following Too Closely
481 
Passing Too Closely
324 
Other values (45)
2833 
ValueCountFrequency (%) 
Unspecified250427.8%
 
Driver Inattention/Distraction226725.2%
 
Failure To Yield Right-Of-Way5976.6%
 
Following Too Closely4815.3%
 
Passing Too Closely3243.6%
 
Passing Or Lane Usage Improper3113.5%
 
Unsafe Speed3093.4%
 
Backing Unsafely2693.0%
 
Other Vehicular2492.8%
 
Traffic Control Disregarded2402.7%
 
Other values (40)145516.2%
 
2020-12-11T11:24:44.672716image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique8 ?
Unique (%)0.1%
2020-12-11T11:24:45.309449image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length53
Median length20
Mean length20.91605596
Min length5
Distinct28
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size70.4 KiB
Unspecified
8032 
Driver Inattention/Distraction
 
361
Other Vehicular
 
106
Following Too Closely
 
93
Failure To Yield Right-Of-Way
 
67
Other values (23)
 
347
ValueCountFrequency (%) 
Unspecified803289.2%
 
Driver Inattention/Distraction3614.0%
 
Other Vehicular1061.2%
 
Following Too Closely931.0%
 
Failure To Yield Right-Of-Way670.7%
 
Passing Too Closely460.5%
 
Passing Or Lane Usage Improper420.5%
 
Unsafe Speed360.4%
 
Traffic Control Disregarded340.4%
 
Unsafe Lane Changing240.3%
 
Other values (18)1651.8%
 
2020-12-11T11:24:45.990884image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique4 ?
Unique (%)< 0.1%
2020-12-11T11:24:46.660842image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length53
Median length11
Mean length12.52442816
Min length11
Distinct14
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size70.4 KiB
Unspecified
8942 
Other Vehicular
 
21
Driver Inattention/Distraction
 
14
Following Too Closely
 
14
Obstruction/Debris
 
4
Other values (9)
 
11
ValueCountFrequency (%) 
Unspecified894299.3%
 
Other Vehicular210.2%
 
Driver Inattention/Distraction140.2%
 
Following Too Closely140.2%
 
Obstruction/Debris4< 0.1%
 
Driver Inexperience2< 0.1%
 
Unsafe Speed2< 0.1%
 
Aggressive Driving/Road Rage1< 0.1%
 
Pavement Slippery1< 0.1%
 
Passing Or Lane Usage Improper1< 0.1%
 
Other values (4)4< 0.1%
 
2020-12-11T11:24:47.155930image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique7 ?
Unique (%)0.1%
2020-12-11T11:24:48.145460image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length30
Median length11
Mean length11.0700644
Min length11
Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size70.4 KiB
Unspecified
8986 
Other Vehicular
 
9
Driver Inattention/Distraction
 
4
Following Too Closely
 
4
Driver Inexperience
 
1
Other values (2)
 
2
ValueCountFrequency (%) 
Unspecified898699.8%
 
Other Vehicular90.1%
 
Driver Inattention/Distraction4< 0.1%
 
Following Too Closely4< 0.1%
 
Driver Inexperience1< 0.1%
 
Passing Or Lane Usage Improper1< 0.1%
 
Obstruction/Debris1< 0.1%
 
2020-12-11T11:24:49.151877image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique3 ?
Unique (%)< 0.1%
2020-12-11T11:24:49.644387image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:24:50.614049image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length30
Median length11
Mean length11.0206529
Min length11
Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size70.4 KiB
Unspecified
8999 
Following Too Closely
 
4
Driver Inattention/Distraction
 
1
Other Vehicular
 
1
Obstruction/Debris
 
1
ValueCountFrequency (%) 
Unspecified899999.9%
 
Following Too Closely4< 0.1%
 
Driver Inattention/Distraction1< 0.1%
 
Other Vehicular1< 0.1%
 
Obstruction/Debris1< 0.1%
 
2020-12-11T11:24:51.513867image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique3 ?
Unique (%)< 0.1%
2020-12-11T11:24:51.791945image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:24:52.204479image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length30
Median length11
Mean length11.0077726
Min length11
Distinct14
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size70.4 KiB
Sedan
4380 
Station Wagon/Sport Utility Vehicle
3032 
Other
 
313
Taxi
 
262
Pick-Up Truck
 
204
Other values (9)
815 
ValueCountFrequency (%) 
Sedan438048.6%
 
Station Wagon/Sport Utility Vehicle303233.7%
 
Other3133.5%
 
Taxi2622.9%
 
Pick-Up Truck2042.3%
 
Box Truck1571.7%
 
Bike1311.5%
 
Bus1211.3%
 
Unspecified1121.2%
 
Tractor Truck Diesel810.9%
 
Other values (4)2132.4%
 
2020-12-11T11:24:52.794933image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-11T11:24:53.373335image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length35
Median length5
Mean length15.53131246
Min length3
Distinct14
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size70.4 KiB
Unspecified
3149 
Sedan
2620 
Station Wagon/Sport Utility Vehicle
1787 
Bike
334 
Other
 
268
Other values (9)
848 
ValueCountFrequency (%) 
Unspecified314935.0%
 
Sedan262029.1%
 
Station Wagon/Sport Utility Vehicle178719.8%
 
Bike3343.7%
 
Other2683.0%
 
Box Truck1671.9%
 
Pick-Up Truck1211.3%
 
Taxi1081.2%
 
Bus1001.1%
 
E-Scooter911.0%
 
Other values (4)2612.9%
 
2020-12-11T11:24:53.968977image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-11T11:24:54.574342image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length35
Median length11
Mean length13.34710193
Min length3
Distinct12
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size70.4 KiB
Unspecified
8152 
Sedan
 
428
Station Wagon/Sport Utility Vehicle
 
366
Other
 
14
Pick-Up Truck
 
13
Other values (7)
 
33
ValueCountFrequency (%) 
Unspecified815290.5%
 
Sedan4284.8%
 
Station Wagon/Sport Utility Vehicle3664.1%
 
Other140.2%
 
Pick-Up Truck130.1%
 
Taxi100.1%
 
Box Truck60.1%
 
Bus50.1%
 
Tractor Truck Diesel4< 0.1%
 
Bike3< 0.1%
 
Other values (2)50.1%
 
2020-12-11T11:24:55.034246image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-11T11:24:55.533235image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length35
Median length11
Mean length11.66899845
Min length3
Distinct11
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size70.4 KiB
Unspecified
8768 
Station Wagon/Sport Utility Vehicle
 
111
Sedan
 
110
Pick-Up Truck
 
6
Bike
 
3
Other values (6)
 
8
ValueCountFrequency (%) 
Unspecified876897.4%
 
Station Wagon/Sport Utility Vehicle1111.2%
 
Sedan1101.2%
 
Pick-Up Truck60.1%
 
Bike3< 0.1%
 
Other2< 0.1%
 
Taxi2< 0.1%
 
Box Truck1< 0.1%
 
Motorcycle1< 0.1%
 
Tractor Truck Diesel1< 0.1%
 
2020-12-11T11:24:55.986747image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique4 ?
Unique (%)< 0.1%
2020-12-11T11:24:56.410446image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length35
Median length11
Mean length11.21840995
Min length3
Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size70.4 KiB
Unspecified
8920 
Sedan
 
42
Station Wagon/Sport Utility Vehicle
 
37
Pick-Up Truck
 
2
Taxi
 
2
Other values (3)
 
3
ValueCountFrequency (%) 
Unspecified892099.0%
 
Sedan420.5%
 
Station Wagon/Sport Utility Vehicle370.4%
 
Pick-Up Truck2< 0.1%
 
Taxi2< 0.1%
 
Other1< 0.1%
 
Bike1< 0.1%
 
Motorcycle1< 0.1%
 
2020-12-11T11:24:56.901099image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique3 ?
Unique (%)< 0.1%
2020-12-11T11:24:57.218710image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:24:57.755613image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length35
Median length11
Mean length11.0679547
Min length4

Interactions

2020-12-11T11:23:24.555273image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:25.427708image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:26.109340image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:27.728725image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:29.260596image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:30.893542image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:31.833489image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:32.912153image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:33.902193image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:35.021202image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:36.020401image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:36.597822image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:37.143125image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:37.735940image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:38.354923image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:39.254073image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:39.954530image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:40.788369image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:41.520804image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:42.033690image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:42.712207image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:43.334701image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:43.869557image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:44.540485image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:45.007067image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:45.594538image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:46.096314image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:46.634664image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:47.284704image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:47.884137image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:48.545886image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:49.231318image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:49.751527image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:50.262293image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:50.835639image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:51.297876image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:52.044573image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:52.635226image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:53.176346image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:53.763049image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:54.475444image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:55.000145image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:55.480244image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:55.978844image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:56.686236image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:57.583700image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:58.352396image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:59.236849image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:23:59.710650image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2020-12-11T11:24:58.229243image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-12-11T11:24:59.051174image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-12-11T11:25:00.571134image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-12-11T11:25:01.564130image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-12-11T11:25:02.841551image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-12-11T11:24:01.576827image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-11T11:24:09.399165image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

df_indexcrash_datecrash_timeboroughzip_codelatitudelongitudeon_street_nameoff_street_namenumber_of_persons_injurednumber_of_persons_killednumber_of_pedestrians_injurednumber_of_pedestrians_killednumber_of_cyclist_injurednumber_of_cyclist_killednumber_of_motorist_injurednumber_of_motorist_killedcontributing_factor_vehicle_1contributing_factor_vehicle_2contributing_factor_vehicle_3contributing_factor_vehicle_4contributing_factor_vehicle_5vehicle_type_code_1vehicle_type_code_2vehicle_type_code_3vehicle_type_code_4vehicle_type_code_5
022020-12-0313:37:00Unknown-140.798504-73.967125West 103 StreetUnknown10100000UnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecified
152020-12-0219:00:00Unknown-140.731167-73.709940256 Street87 Avenue00000000UnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedTaxiUnspecifiedUnspecifiedUnspecifiedUnspecified
292020-11-3009:40:00Queens1137540.735550-73.85097062-60 108 StreetUnknown00000000UnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecified
3102020-11-2905:45:00Unknown-140.701527-73.989570Brooklyn Queens ExpresswayUnknown00000000Fell AsleepUnspecifiedUnspecifiedUnspecifiedUnspecifiedSedanUnspecifiedUnspecifiedUnspecifiedUnspecified
4122020-11-2623:30:00Unknown-140.700108-73.953830Wallabout StreetUnknown10000010Driver Inattention/DistractionUnspecifiedUnspecifiedUnspecifiedUnspecifiedSedanSedanUnspecifiedUnspecifiedUnspecified
5182020-11-2311:28:00Brooklyn1121540.668293-73.9792406 StreetUnknown10000010Driver Inattention/DistractionUnspecifiedUnspecifiedUnspecifiedUnspecifiedSedanUnspecifiedUnspecifiedUnspecifiedUnspecified
6212020-11-2220:10:00Unknown-140.624640-74.141670Forest AvenueDecker Avenue10100000UnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecified
7272020-11-2012:00:00Unknown-140.677483-73.930330Utica AvenueUnknown10001000Other VehicularDriver Inattention/DistractionUnspecifiedUnspecifiedUnspecifiedTaxiBikeUnspecifiedUnspecifiedUnspecified
8332020-11-1811:00:00Manhattan1001040.736706-73.978220East 23 StreetUnknown10100000Turning ImproperlyUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecified
9352017-01-1703:02:00Unknown-140.608757-74.038086Verrazano Bridge LowerUnknown00000000UnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedSedanUnspecifiedUnspecifiedUnspecifiedUnspecified

Last rows

df_indexcrash_datecrash_timeboroughzip_codelatitudelongitudeon_street_nameoff_street_namenumber_of_persons_injurednumber_of_persons_killednumber_of_pedestrians_injurednumber_of_pedestrians_killednumber_of_cyclist_injurednumber_of_cyclist_killednumber_of_motorist_injurednumber_of_motorist_killedcontributing_factor_vehicle_1contributing_factor_vehicle_2contributing_factor_vehicle_3contributing_factor_vehicle_4contributing_factor_vehicle_5vehicle_type_code_1vehicle_type_code_2vehicle_type_code_3vehicle_type_code_4vehicle_type_code_5
899699902020-11-0212:20:00Queens1143440.656160-73.76736Rockaway BoulevardBrewer Boulevard00000000UnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedStation Wagon/Sport Utility VehicleUnspecifiedUnspecifiedUnspecifiedUnspecified
899799912020-11-0418:00:00Bronx1046840.860850-73.90545Aqueduct AvenueUnknown00000000UnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedStation Wagon/Sport Utility VehicleUnspecifiedUnspecifiedUnspecifiedUnspecified
899899922020-11-1218:20:00Queens1143340.704388-73.77917180 Street105 Avenue20000020Driver Inattention/DistractionUnspecifiedUnspecifiedUnspecifiedUnspecifiedSedanStation Wagon/Sport Utility VehicleUnspecifiedUnspecifiedUnspecified
899999932020-11-1807:30:00Manhattan1003140.829020-73.94485Amsterdam AvenueWest 151 Street00000000Driver Inattention/DistractionUnspecifiedUnspecifiedUnspecifiedUnspecifiedSedanUnspecifiedUnspecifiedUnspecifiedUnspecified
900099942020-11-1321:43:00Brooklyn1121740.686500-73.98787Hoyt StreetDean Street00000000UnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedSedanUnspecifiedUnspecifiedUnspecifiedUnspecified
900199952020-11-0412:10:00Unknown-140.658535-73.97328West DriveCenter Drive20101000UnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedBikeUnspecifiedUnspecifiedUnspecifiedUnspecified
900299962020-11-1809:15:00Queens1169140.598297-73.7482814-05 New Haven AvenueUnknown00000000Driver Inattention/DistractionUnspecifiedUnspecifiedUnspecifiedUnspecifiedStation Wagon/Sport Utility VehicleUnspecifiedUnspecifiedUnspecifiedUnspecified
900399972020-11-0411:24:00Brooklyn1121840.640415-73.9859339 StreetUnknown10000010UnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedStation Wagon/Sport Utility VehicleBox TruckUnspecifiedUnspecifiedUnspecified
900499982020-11-1312:05:00Queens1142740.735300-73.7368182-25 234 StreetUnknown00000000Tire Failure/InadequateUnspecifiedUnspecifiedUnspecifiedUnspecifiedStation Wagon/Sport Utility VehicleSedanStation Wagon/Sport Utility VehicleUnspecifiedUnspecified
900599992020-11-0308:00:00Queens1136840.750225-73.85515111 Street42 Avenue00000000UnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedSedanSedanUnspecifiedUnspecifiedUnspecified